Main Results
Treatment Effect on Identifying Manipulation
Summary Results - Identifying Manipulation
- After our course, participants became more skeptical of all information.
- Treated participants labelled manipulative posts as 12.3 percentage points (0.63 out of scale of 1-6) more manipulative.
- They also labelled non-manipulative posts 15 percentage points (0.77 out of scale of 1-6) more manipulative.
- Treated participants more confidently identified manipulative posts as manipulative than non-manipulative posts.
H1 - Identifying Manipulativeness
H1 is that: participants will be more capable of rating misinformation correctly as manipulative after taking the course.
Note that, by construction, we test this hypothesis by only looking at test questions that actually contain manipulative tactics. SH1 will examine the test questions that were non-manipulative.
The table below summarizes the output of the standard t.test function in R:
estimaterepresents the delta in the group means (treated vs. control)
estimate1is the treated group’s mean delta
estimate2is the control group’s mean delta
statisticis the test statistic
p.valueis the unadjusted p-value
p.value_ajustedis the p-value with 10 adjustments using the BH correction
conf.lowandconf.highare the bounds of a 95% confidence interval
# Calculations for this in a later section
alpha_corrected = 0.007 # calculate by simulating through bootstrap on control data.
conf_level = 1 - alpha_corrected/2t.test(
df_attention %>% filter(treated == 1) %>% pull(mean_Delta_1_False),
df_attention %>% filter(treated == 0) %>% pull(mean_Delta_1_False),
conf.level = conf_level
) %>%
broom::tidy() %>%
# mutate(p.value_adjusted = p.adjust(p.value, method = "BH", n = 10)) %>%
mutate_if(is.numeric, round, digits = 2) %>%
dplyr::select(-method, -alternative)## # A tibble: 1 × 8
## estimate estimate1 estimate2 statistic p.value parameter conf.low conf.high
## <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 0.63 0.74 0.11 6.55 0 436. 0.35 0.91
These results indicate that those who take our tactics course rate manipulative posts 0.63 points more manipulative on our 6 point scale. This is 12.6 percentage points (\(0.63/5\)) of our 1-6 point scale and is significant before and after multiple testing adjustments. This suggests that participants became more capable of identifying misinformation after taking the course.
Our graphs in this and the following section follow a similar design - with each line representing the transition from pre (left hand side point) to post (right hand side point). The bold line represents treatment, and the grey line represents control. Therefore, a positive slope bold line represents an increase in the y value from pre to post.
pre_post_individual <- df_attention %>%
mutate(
pre_mean = (
MisGraph1_Pre_1 + FalseComp1_Pre_1 + Anecdotes1_Pre_1 + Combined1_Pre_1
) / 4,
post_mean = (
MisGraph1_Post_1 + FalseComp1_Post_1 + Anecdotes1_Post_1 + Combined1_Post_1
) / 4
)
t.test(pre_post_individual %>% filter(treated == 1) %>% pull(pre_mean)) %>% broom::tidy() %>% mutate(period = "pre", treated = "1") %>%
rbind(
t.test(pre_post_individual %>% filter(treated == 0) %>% pull(pre_mean)) %>% broom::tidy() %>% mutate(period = "pre", treated = "0")
) %>%
rbind(
t.test(pre_post_individual %>% filter(treated == 1) %>% pull(post_mean)) %>% broom::tidy() %>% mutate(period = "post", treated = "1")
) %>%
rbind(
t.test(pre_post_individual %>% filter(treated == 0) %>% pull(post_mean)) %>% broom::tidy() %>% mutate(period = "post", treated = "0")
) %>%
mutate(period = factor(period, levels = c("pre", "post")),
treated = factor(treated, levels = c("1", "0"))) %>%
ggplot(aes(
period,
estimate,
ymin = conf.low,
ymax = conf.high,
color = treated
)) +
geom_pointrange(position = position_dodge(.05)) +
geom_line(aes(group = treated)) +
scale_color_grey() +
labs(
title = "H1: Identifying Manipulativeness",
subtitle = "Average score on scale 1-6, with 6 being most manipulative",
x = "pre- or post-treatment period",
y = "Average score"
) +
theme_classic()# Calculations for this in a later section
alpha_corrected = 0.007 # calculate by simulating through bootstrap on control data.
conf_level = 1 - alpha_corrected/2For reference, if we also penalize participants for identifying factual questions as manipulative, then this result is dampened (as illustrated below). This is discussed further with SH1.
t.test(
df_attention %>% filter(treated == 1) %>% pull(mean_Delta_1),
df_attention %>% filter(treated == 0) %>% pull(mean_Delta_1),
conf.level = conf_level
) %>%
broom::tidy() %>%
# mutate(p.value_adjusted = p.adjust(p.value, method = "BH", n = 10)) %>%
mutate_if(is.numeric, round, digits = 2) %>%
dplyr::select(-method, -alternative)## # A tibble: 1 × 8
## estimate estimate1 estimate2 statistic p.value parameter conf.low conf.high
## <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 0.16 0.29 0.13 2.15 0.03 420. -0.06 0.38
SH1 - Misidentifying True
SH1 is that: participants will not identify true content as more manipulative after taking the course.
Those who take our tactics course also rate true posts 0.77 points more manipulative on our 1-6 point scale (15.4 percentage points (\(0.77/5\))). In other words, our treatment also caused people to be more suspicious of factual information.
This effect may be a result of priming or the experimenter demand effect, and might not last as long as the knowledge they gained about identifying misinformation. Thus, we cannot tell apart whether either of these effects is driving the results, or if it is because participants became generally skeptical of all information. In any case, this is something First Draft should be aware of. This finding is plotted in the figure above alongside the result for test questions with misinformation. As illustrated, treated participants became more likely to identify true and false statements as misinformation in the post-test.
pre_post_individual <- df_attention %>%
mutate(
pre_mean_true = ((7 - True1_Pre_1) + (7 - True2_Pre_1)) / 2,
pre_mean_false = (
MisGraph1_Pre_1 + FalseComp1_Pre_1 + Anecdotes1_Pre_1 + Combined1_Pre_1
) / 4,
post_mean_true = ((7 - True1_Post_1) + (7 - True2_Post_1)) / 2,
post_mean_false = (
MisGraph1_Post_1 + FalseComp1_Post_1 + Anecdotes1_Post_1 + Combined1_Post_1
) / 4
)
t.test(pre_post_individual %>% filter(treated == 1) %>% pull(pre_mean_true)) %>% broom::tidy() %>% mutate(period = "pre",
treated = "1",
type = "Not Misinformation") %>%
rbind(
t.test(
pre_post_individual %>% filter(treated == 0) %>% pull(pre_mean_true)
) %>% broom::tidy() %>% mutate(
period = "pre",
treated = "0",
type = "Not Misinformation"
)
) %>%
rbind(
t.test(
pre_post_individual %>% filter(treated == 1) %>% pull(post_mean_true)
) %>% broom::tidy() %>% mutate(
period = "post",
treated = "1",
type = "Not Misinformation"
)
) %>%
rbind(
t.test(
pre_post_individual %>% filter(treated == 0) %>% pull(post_mean_true)
) %>% broom::tidy() %>% mutate(
period = "post",
treated = "0",
type = "Not Misinformation"
)
) %>%
rbind(
t.test(
pre_post_individual %>% filter(treated == 1) %>% pull(pre_mean_false)
) %>% broom::tidy() %>% mutate(
period = "pre",
treated = "1",
type = "Misinformation"
)
) %>%
rbind(
t.test(
pre_post_individual %>% filter(treated == 0) %>% pull(pre_mean_false)
) %>% broom::tidy() %>% mutate(
period = "pre",
treated = "0",
type = "Misinformation"
)
) %>%
rbind(
t.test(
pre_post_individual %>% filter(treated == 1) %>% pull(post_mean_false)
) %>% broom::tidy() %>% mutate(
period = "post",
treated = "1",
type = "Misinformation"
)
) %>%
rbind(
t.test(
pre_post_individual %>% filter(treated == 0) %>% pull(post_mean_false)
) %>% broom::tidy() %>% mutate(
period = "post",
treated = "0",
type = "Misinformation"
)
) %>%
mutate(
period = factor(period, levels = c("pre", "post")),
treated = factor(treated, levels = c("1", "0")),
estimate = if_else(type == "Not Misinformation", estimate, estimate),
conf.low = if_else(type == "Not Misinformation", conf.low, conf.low),
conf.high = if_else(type == "Not Misinformation", conf.high, conf.high)
) %>%
ggplot(aes(
period,
estimate,
ymin = conf.low,
ymax = conf.high,
color = treated
)) +
geom_pointrange(position = position_dodge(.1)) +
geom_line(aes(group = interaction(treated, type))) +
scale_color_grey() +
labs(
title = "SH1: Misinformation Identification by True & False Questions",
subtitle = "Average score on scale 1-6, with 6 being most manipulative",
x = "pre- or post-treatment period",
y = "Manipulativeness Score (1-6)"
) +
theme_classic() +
facet_wrap(. ~ type)t.test(
df_attention %>% filter(treated == 1) %>% pull(mean_Delta_1_True),
df_attention %>% filter(treated == 0) %>% pull(mean_Delta_1_True),
conf.level = conf_level
) %>%
broom::tidy() %>%
# mutate(p.value_adjusted = p.adjust(p.value, method = "BH", n = 10)) %>%
mutate_if(is.numeric, round, digits = 2) %>%
dplyr::select(-method, -alternative)## # A tibble: 1 × 8
## estimate estimate1 estimate2 statistic p.value parameter conf.low conf.high
## <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 -0.77 -0.61 0.16 -5.33 0 464. -1.2 -0.35
We try to reduce variance with covariate adjustments and other checks. The summary of results are shown in the table below, confirming the results we’ve shown above. See the Additional Analyses section for more details on variance reduction.
Breakdown by Question
To get a better understanding of treatment effects on different types of questions, we look at the change in manipulativeness for each question category.
We see that the graph questions result in the largest increase in
manipulative answers for both manipulative and non-manipulative
questions (True1 has a valid graph).
pre_post_individual %>%
mutate(
True1_Pre_1 = 7 - True1_Pre_1,
True2_Pre_1 = 7 - True2_Pre_1,
True1_Post_1 = 7 - True1_Post_1,
True2_Post_1 = 7 - True2_Post_1
) %>%
dplyr::select(
treated,
MisGraph1_Pre_1,
FalseComp1_Pre_1,
Anecdotes1_Pre_1,
Combined1_Pre_1,
MisGraph1_Post_1,
FalseComp1_Post_1,
Anecdotes1_Post_1,
Combined1_Post_1,
True1_Pre_1,
True2_Pre_1,
True1_Post_1,
True2_Post_1
) %>%
filter(treated == 1) %>%
select_if(is.numeric) %>%
map_df(~ broom::tidy(t.test(., conf.level = conf_level)), .id = 'var') %>%
mutate(treated = "1") %>%
rbind(
pre_post_individual %>%
mutate(
True1_Pre_1 = 7 - True1_Pre_1,
True2_Pre_1 = 7 - True2_Pre_1,
True1_Post_1 = 7 - True1_Post_1,
True2_Post_1 = 7 - True2_Post_1
) %>%
dplyr::select(
treated,
MisGraph1_Pre_1,
FalseComp1_Pre_1,
Anecdotes1_Pre_1,
Combined1_Pre_1,
MisGraph1_Post_1,
FalseComp1_Post_1,
Anecdotes1_Post_1,
Combined1_Post_1,
True1_Pre_1,
True2_Pre_1,
True1_Post_1,
True2_Post_1
) %>%
filter(treated == 0) %>%
select_if(is.numeric) %>%
map_df(~ broom::tidy(t.test(., conf.level = conf_level)), .id = 'var') %>%
mutate(treated = "0")
) %>%
separate(var, into = c("type", "period")) %>%
mutate(period = factor(period, levels = c("Pre", "Post")),
treated = factor(treated, levels = c("1", "0"))) %>%
ggplot(aes(
period,
estimate,
ymin = conf.low,
ymax = conf.high,
color = treated
)) +
geom_pointrange(position = position_dodge(.1)) +
geom_line(aes(group = interaction(treated, type))) +
scale_color_grey() +
labs(
title = "Identifying Manipulativeness by Question",
subtitle = "Average score on scale 1-6, with 6 being most manipulative",
x = "pre- or post-treatment period",
y = "Average score"
) +
theme_classic() +
facet_wrap(. ~ type)Deep Dive into Score Distributions
We explore the score distributions of the pre- and post-tests to gain a better understanding of participant responses. We look at the treated group’s pre-treatment distribution of answers to non-manipulative questions and manipulative questions.
As shown, pre-test scores for non-manipulative posts are skewed right, whereas scores for manipulative posts are relatively uniform.
In the post-test distribution, however, there is a clear shift towards “6 definitely manipulative” for many of the manipulative posts. There is also a shift for non-manipulative posts, and while on average the size of the changes are similar, they are not as concentrated at the value 6.
df_attention %>%
select(
treated,
True1_Pre_1,
True1_Post_1,
True2_Pre_1,
True2_Post_1,
MisGraph1_Pre_1,
MisGraph1_Post_1,
Anecdotes1_Pre_1,
Anecdotes1_Post_1,
FalseComp1_Pre_1,
FalseComp1_Post_1,
Combined1_Pre_1,
Combined1_Post_1
) %>%
mutate(
True1_Pre_1 = 7 - True1_Pre_1,
True1_Post_1 = 7 - True1_Post_1,
True2_Pre_1 = 7 - True2_Pre_1,
True2_Post_1 = 7 - True2_Post_1
) %>%
pivot_longer(
!treated,
names_to = "question",
values_to = "Score"
) %>%
mutate(
is_pre = if_else(str_detect(question, "Pre"), '1) pre-test', '2) post-test'),
question_name = str_extract(question, '[^_]*'),
question_type = ifelse(str_detect(question_name, 'True'), 'Non-Manipulative Posts', 'Manipulative Posts')
) %>%
filter(treated == 1) %>%
group_by(question_type) %>%
ggplot(aes(x=Score, group=is_pre, fill=is_pre)) +
# geom_bar(aes(y=..prop..), position=position_dodge()) +
geom_bar(aes(y=..prop..), position='identity') +
scale_fill_manual(values = alpha(c("#E69F00","#0072B2"), .5)) +
facet_wrap(question_type ~ .) +
ggtitle('Score Distributions for Treated') +
scale_x_continuous(breaks = seq(1, 6, by = 1)) +
ylab('proportion')Non-Linearity
Although we see a relatively large effect of misidentifying non-manipulative posts, we also suspect that the 6 point scale of labeling for manipulativeness may not be linear. It is probably easier for participants to move from 2 to 3 than it is for them to move from 5 to 6 since it is easier to be less confident than to be very confident.
To explore this change more formally, binarized the scores to approximate participants’ intentions when selecting different values. We wanted to understand where would be a suitable cutoff where users really felt a change in manipulativeness.
We use the pre-test distributions to explore where the greatest relative change is on the scale in the distribution of answers to the true and manipulative posts. We do this by looking at the difference between the share of answers falling above a potential binary split for the true versus for the manipulative posts. We calculate these values for all possible binary splits and choose the one where the difference is the greatest. This is where the distributions are the most different in the pre true and manipulative answers. Our goal is to find the best binary split in terms of how people perceive the scale. This method is not ideal as it is still not a direct measure of perception. However, it is the best we can do given our data.
df %>%
dplyr::select(
'True1_Pre_1',
'True2_Pre_1',
'MisGraph1_Pre_1',
'Anecdotes1_Pre_1',
'FalseComp1_Pre_1',
'Combined1_Pre_1'
) %>%
mutate(True1_Pre_1 = 7 - True1_Pre_1,
True2_Pre_1 = 7 - True2_Pre_1) %>%
pivot_longer(
c(
'True1_Pre_1',
'True2_Pre_1',
'MisGraph1_Pre_1',
'Anecdotes1_Pre_1',
'FalseComp1_Pre_1',
'Combined1_Pre_1'
),
names_to = "question",
values_to = "value"
) %>%
mutate(
is_true = question %in% c('True1_Pre_1',
'True2_Pre_1'),
split_lte_1 = value <= 1,
split_lte_2 = value <= 2,
split_lte_3 = value <= 3,
split_lte_4 = value <= 4,
split_lte_5 = value <= 5
) %>%
# group_by(is_true) %>%
pivot_longer(
c(
'split_lte_1',
'split_lte_2',
'split_lte_3',
'split_lte_4',
'split_lte_5'
),
names_to = "split",
values_to = "split_value"
) %>%
group_by(is_true, split) %>%
mutate(split_total = n()) %>%
ungroup() %>%
group_by(is_true, split, split_value) %>%
mutate(num_in_split = n(),
pct_in_split = num_in_split / split_total) %>%
ungroup() %>%
dplyr::select(is_true, split, split_value, pct_in_split) %>%
group_by(is_true, split) %>%
pivot_wider(names_from = split_value,
values_from = pct_in_split,
values_fn = mean) %>%
ungroup() %>%
mutate(pct_above = `FALSE`,
pct_diff = `TRUE` - `FALSE`) %>%
dplyr::select(is_true,
split,
pct_above) %>%
group_by(split) %>%
pivot_wider(names_from = is_true,
values_from = pct_above) %>%
ungroup() %>%
rename(
split_less_than_equal_to = split,
true_questions_above = `TRUE`,
false_questions_above = `FALSE`
) %>%
mutate(
true_questions_above = round(true_questions_above, 4),
false_questions_above = round(false_questions_above, 4),
abs_diff = round(abs(
true_questions_above - false_questions_above
), 4)
)## # A tibble: 5 × 4
## split_less_than_equal_to true_questions_above false_questions_above abs_diff
## <chr> <dbl> <dbl> <dbl>
## 1 split_lte_1 0.708 0.845 0.137
## 2 split_lte_2 0.600 0.777 0.178
## 3 split_lte_3 0.442 0.608 0.166
## 4 split_lte_4 0.245 0.381 0.136
## 5 split_lte_5 0.139 0.221 0.0825
We see that the optimal split is at a score of 2. Note that we needed the optimal split to be greater than the status quo score of 3 in order to notice a reduced treatment effect for non-manipulative posts. This is since manipulative posts have higher scores on average already, so we need a higher cut-off if we wanted the non-manipulative posts to not have as much change. Thus, using a score less than 3 does not change the direction of our insights. Skepticism of both manipulative and non-manipulative posts still increased after our treatment even if we use this binary metric.
t.test(
df_attention %>% filter(treated == 1) %>% pull(mean_Delta_1_False),
df_attention %>% filter(treated == 0) %>% pull(mean_Delta_1_False),
conf.level = conf_level
) %>%
broom::tidy() %>%
# mutate(p.value_adjusted = p.adjust(p.value, method = "BH", n = 10)) %>%
mutate_if(is.numeric, round, digits = 2) %>%
dplyr::select(-method, -alternative)## # A tibble: 1 × 8
## estimate estimate1 estimate2 statistic p.value parameter conf.low conf.high
## <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 0.63 0.74 0.11 6.55 0 436. 0.35 0.91
Tactics
Summary Results - Tactics
- Treated participants could better identify tactics when they are present in the post.
- They also were more likely to incorrectly state that a tactic is used when in fact it is not present in the post.
- Treated participants were more likely to select more tactics after the course, but only the tactics that we had taught them.
- There wasn’t a significant decrease in precision, indicating that the treated participants were not randomly selecting more tactics. It is likely that participants selected the correct tactic, then added a few other tactics with lower confidence just in case. This hypothesis would need to be validated in a future experiment.
- The treatment caused participants to be more skeptical with true posts. Their overall performance on the true posts decreased, which offset the improved performance on manipulative posts. This result is comparable to the above results about overall perception of manipulativeness. It may be driven by the same potential reasons: priming effects, experimenter demand effects, or increase in skepticism in general. In case of the misidentification of tactics, the result may also be due to the inherent trickiness of our true posts.
Metrics Definitions
We evaluate multiple metrics to understand the behavior and understanding gained by participants in relation to our tactics.
In this section, we define the metrics for our analysis using a toy example.
Consider a case with 2 posts: a true post that should be answered “true” and an anecdote post that should be answered as “misleading anecdote.”
For each post, the user has 2 checkboxes that they could select: “true” or “misleading anecdote”. The user could select 1 or both checkboxes per post.
Let’s assume the user only picked “misleading anecdote” as their choice for both of these posts (shown below using yellow shading).
Accuracy measures the number of times the user correctly identified the existence or lack of existence of a specific tactic. This participant’s accuracy is 0.5.
Note that the correct “tactic” for the true post is “True”.
True Positive Rate (Recall) measure the number of times the user identified the tactic out of all the posts that do contain the tactic. This participant’s true positive rate is 0.5.
False positive rate measures the number of times user mistakenly chose a tactic out of posts that do not have the tactic. This participant’s false positive rate is 0.5.
Note once again that the correct “tactic” for the true post is “True”.
Precision measures the number of times the tactic is actually present in the post out of all the times the user says that the tactic is present. This participant’s precision is 0.5.
Metrics Overview for Tactics and Questions
The following figures plot the accuracy, true positive rate, false positive rate, and precision of the participants’ abilities to identify individual manipulative tactics in the test questions.
The first figure plots accuracy, which measures the overall performance for all the tactics. We notice a non-significant drop in accuracy across all three tactics.
t.test(df_attention %>%
filter(treated == 1) %>%
pull(True1_Pre_2_accuracy), conf.level = conf_level) %>%
broom::tidy() %>%
mutate(period = "pre",
treated = "1",
type = "True") %>%
rbind(
t.test(
df_attention %>% filter(treated == 0) %>% pull(True1_Pre_2_accuracy), conf.level = conf_level
) %>% broom::tidy() %>% mutate(
period = "pre",
treated = "0",
type = "True"
)
) %>%
rbind(
t.test(
df_attention %>% filter(treated == 1) %>% pull(True1_Post_2_accuracy), conf.level = conf_level
) %>% broom::tidy() %>% mutate(
period = "post",
treated = "1",
type = "True"
)
) %>%
rbind(
t.test(
df_attention %>% filter(treated == 0) %>% pull(True1_Post_2_accuracy), conf.level = conf_level
) %>% broom::tidy() %>% mutate(
period = "post",
treated = "0",
type = "True"
)
) %>%
rbind(
t.test(
df_attention %>% filter(treated == 1) %>% pull(MisGraph1_Pre_2_accuracy), conf.level = conf_level
) %>% broom::tidy() %>% mutate(
period = "pre",
treated = "1",
type = "Misleading Graphs"
)
) %>%
rbind(
t.test(
df_attention %>% filter(treated == 0) %>% pull(MisGraph1_Pre_2_accuracy), conf.level = conf_level
) %>% broom::tidy() %>% mutate(
period = "pre",
treated = "0",
type = "Misleading Graphs"
)
) %>%
rbind(
t.test(
df_attention %>% filter(treated == 1) %>% pull(MisGraph1_Post_2_accuracy), conf.level = conf_level
) %>% broom::tidy() %>% mutate(
period = "post",
treated = "1",
type = "Misleading Graphs"
)
) %>%
rbind(
t.test(
df_attention %>% filter(treated == 0) %>% pull(MisGraph1_Post_2_accuracy), conf.level = conf_level
) %>% broom::tidy() %>% mutate(
period = "post",
treated = "0",
type = "Misleading Graphs"
)
) %>%
rbind(
t.test(
df_attention %>% filter(treated == 1) %>% pull(FalseComp1_Pre_2_accuracy), conf.level = conf_level
) %>% broom::tidy() %>% mutate(
period = "pre",
treated = "1",
type = "False Comparisons"
)
) %>%
rbind(
t.test(
df_attention %>% filter(treated == 0) %>% pull(FalseComp1_Pre_2_accuracy), conf.level = conf_level
) %>% broom::tidy() %>% mutate(
period = "pre",
treated = "0",
type = "False Comparisons"
)
) %>%
rbind(
t.test(
df_attention %>% filter(treated == 1) %>% pull(FalseComp1_Post_2_accuracy), conf.level = conf_level
) %>% broom::tidy() %>% mutate(
period = "post",
treated = "1",
type = "False Comparisons"
)
) %>%
rbind(
t.test(
df_attention %>% filter(treated == 0) %>% pull(FalseComp1_Post_2_accuracy), conf.level = conf_level
) %>% broom::tidy() %>% mutate(
period = "post",
treated = "0",
type = "False Comparisons"
)
) %>%
rbind(
t.test(
df_attention %>% filter(treated == 1) %>% pull(Anecdotes1_Pre_2_accuracy), conf.level = conf_level
) %>% broom::tidy() %>% mutate(
period = "pre",
treated = "1",
type = "Anecdotes"
)
) %>%
rbind(
t.test(
df_attention %>% filter(treated == 0) %>% pull(Anecdotes1_Pre_2_accuracy), conf.level = conf_level
) %>% broom::tidy() %>% mutate(
period = "pre",
treated = "0",
type = "Anecdotes"
)
) %>%
rbind(
t.test(
df_attention %>% filter(treated == 1) %>% pull(Anecdotes1_Post_2_accuracy), conf.level = conf_level
) %>% broom::tidy() %>% mutate(
period = "post",
treated = "1",
type = "Anecdotes"
)
) %>%
rbind(
t.test(
df_attention %>% filter(treated == 0) %>% pull(Anecdotes1_Post_2_accuracy), conf.level = conf_level
) %>% broom::tidy() %>% mutate(
period = "post",
treated = "0",
type = "Anecdotes"
)
) %>%
mutate(period = factor(period, levels = c("pre", "post")),
treated = factor(treated, levels = c("1", "0"))) %>%
ggplot(aes(
period,
estimate,
ymin = conf.low,
ymax = conf.high,
color = treated
)) +
geom_pointrange(position = position_dodge(.1)) +
geom_line(aes(group = interaction(treated, type))) +
scale_color_grey() +
labs(title = "Accuracy Across Tactics",
x = "pre- or post-treatment period",
y = "Tactic Accuracy") +
theme_classic() +
facet_wrap(. ~ type)To better understand the lack of change in accuracy, we break down accuracy into the true positive rate and false positive rate. From here, we can see significant increases in ability to identify tactics when they are present (true positive rate) as well as an increase in mistakes made when tactics do not exist (false positive rate). The increase in false positive rate is due to an increase in the number of tactics selected after treatment (see the Number of Tactics Selected section below). Since the correct answers consisted primarily of unchecked boxes, selecting more boxes yields more false positives and could lead to an overall decrease in accuracy. The biggest contributor to the drop in accuracy is due to mistakes made on true posts (see the Unpenalized Tactics Identification section below for more detail).
recall_by_tactic <- t.test(
df_attention %>% filter(treated == 1) %>% pull(MisGraph1_Pre_2_recall)
) %>% broom::tidy() %>% mutate(
period = "pre",
treated = "1",
type = "Misleading Graphs"
) %>%
rbind(
t.test(
df_attention %>% filter(treated == 0) %>% pull(MisGraph1_Pre_2_recall)
) %>% broom::tidy() %>% mutate(
period = "pre",
treated = "0",
type = "Misleading Graphs"
)
) %>%
rbind(
t.test(
df_attention %>% filter(treated == 1) %>% pull(MisGraph1_Post_2_recall)
) %>% broom::tidy() %>% mutate(
period = "post",
treated = "1",
type = "Misleading Graphs"
)
) %>%
rbind(
t.test(
df_attention %>% filter(treated == 0) %>% pull(MisGraph1_Post_2_recall)
) %>% broom::tidy() %>% mutate(
period = "post",
treated = "0",
type = "Misleading Graphs"
)
) %>%
rbind(
t.test(
df_attention %>% filter(treated == 1) %>% pull(FalseComp1_Pre_2_recall)
) %>% broom::tidy() %>% mutate(
period = "pre",
treated = "1",
type = "False Comparisons"
)
) %>%
rbind(
t.test(
df_attention %>% filter(treated == 0) %>% pull(FalseComp1_Pre_2_recall)
) %>% broom::tidy() %>% mutate(
period = "pre",
treated = "0",
type = "False Comparisons"
)
) %>%
rbind(
t.test(
df_attention %>% filter(treated == 1) %>% pull(FalseComp1_Post_2_recall)
) %>% broom::tidy() %>% mutate(
period = "post",
treated = "1",
type = "False Comparisons"
)
) %>%
rbind(
t.test(
df_attention %>% filter(treated == 0) %>% pull(FalseComp1_Post_2_recall)
) %>% broom::tidy() %>% mutate(
period = "post",
treated = "0",
type = "False Comparisons"
)
) %>%
rbind(
t.test(
df_attention %>% filter(treated == 1) %>% pull(Anecdotes1_Pre_2_recall)
) %>% broom::tidy() %>% mutate(
period = "pre",
treated = "1",
type = "Anecdotes"
)
) %>%
rbind(
t.test(
df_attention %>% filter(treated == 0) %>% pull(Anecdotes1_Pre_2_recall)
) %>% broom::tidy() %>% mutate(
period = "pre",
treated = "0",
type = "Anecdotes"
)
) %>%
rbind(
t.test(
df_attention %>% filter(treated == 1) %>% pull(Anecdotes1_Post_2_recall)
) %>% broom::tidy() %>% mutate(
period = "post",
treated = "1",
type = "Anecdotes"
)
) %>%
rbind(
t.test(
df_attention %>% filter(treated == 0) %>% pull(Anecdotes1_Post_2_recall)
) %>% broom::tidy() %>% mutate(
period = "post",
treated = "0",
type = "Anecdotes"
)
) %>%
mutate(period = factor(period, levels = c("pre", "post")),
treated = factor(treated, levels = c("1", "0"))) %>%
ggplot(aes(
period,
estimate,
ymin = conf.low,
ymax = conf.high,
color = treated
)) +
geom_pointrange(position = position_dodge(.1)) +
geom_line(aes(group = interaction(treated, type))) +
scale_color_grey() +
labs(
title = "True Positive Rate (Recall) Across Tactics",
x = "pre- or post-treatment period",
y = "Tactic Recall") +
theme_classic() +
theme(axis.title.x=element_blank()) +
facet_wrap(. ~ type)fpr_by_tactic <- t.test(df_attention %>% filter(treated == 1) %>% pull(MisGraph1_Pre_2_fpr)) %>% broom::tidy() %>% mutate(period = "pre",
treated = "1",
type = "Misleading Graphs") %>%
rbind(
t.test(
df_attention %>% filter(treated == 0) %>% pull(MisGraph1_Pre_2_fpr)
) %>% broom::tidy() %>% mutate(
period = "pre",
treated = "0",
type = "Misleading Graphs"
)
) %>%
rbind(
t.test(
df_attention %>% filter(treated == 1) %>% pull(MisGraph1_Post_2_fpr)
) %>% broom::tidy() %>% mutate(
period = "post",
treated = "1",
type = "Misleading Graphs"
)
) %>%
rbind(
t.test(
df_attention %>% filter(treated == 0) %>% pull(MisGraph1_Post_2_fpr)
) %>% broom::tidy() %>% mutate(
period = "post",
treated = "0",
type = "Misleading Graphs"
)
) %>%
rbind(
t.test(
df_attention %>% filter(treated == 1) %>% pull(FalseComp1_Pre_2_fpr)
) %>% broom::tidy() %>% mutate(
period = "pre",
treated = "1",
type = "False Comparisons"
)
) %>%
rbind(
t.test(
df_attention %>% filter(treated == 0) %>% pull(FalseComp1_Pre_2_fpr)
) %>% broom::tidy() %>% mutate(
period = "pre",
treated = "0",
type = "False Comparisons"
)
) %>%
rbind(
t.test(
df_attention %>% filter(treated == 1) %>% pull(FalseComp1_Post_2_fpr)
) %>% broom::tidy() %>% mutate(
period = "post",
treated = "1",
type = "False Comparisons"
)
) %>%
rbind(
t.test(
df_attention %>% filter(treated == 0) %>% pull(FalseComp1_Post_2_fpr)
) %>% broom::tidy() %>% mutate(
period = "post",
treated = "0",
type = "False Comparisons"
)
) %>%
rbind(
t.test(
df_attention %>% filter(treated == 1) %>% pull(Anecdotes1_Pre_2_fpr)
) %>% broom::tidy() %>% mutate(
period = "pre",
treated = "1",
type = "Anecdotes"
)
) %>%
rbind(
t.test(
df_attention %>% filter(treated == 0) %>% pull(Anecdotes1_Pre_2_fpr)
) %>% broom::tidy() %>% mutate(
period = "pre",
treated = "0",
type = "Anecdotes"
)
) %>%
rbind(
t.test(
df_attention %>% filter(treated == 1) %>% pull(Anecdotes1_Post_2_fpr)
) %>% broom::tidy() %>% mutate(
period = "post",
treated = "1",
type = "Anecdotes"
)
) %>%
rbind(
t.test(
df_attention %>% filter(treated == 0) %>% pull(Anecdotes1_Post_2_fpr)
) %>% broom::tidy() %>% mutate(
period = "post",
treated = "0",
type = "Anecdotes"
)
) %>%
mutate(period = factor(period, levels = c("pre", "post")),
treated = factor(treated, levels = c("1", "0"))) %>%
ggplot(aes(
period,
estimate,
ymin = conf.low,
ymax = conf.high,
color = treated
)) +
geom_pointrange(position = position_dodge(.1)) +
geom_line(aes(group = interaction(treated, type))) +
scale_color_grey() +
labs(
title = "False Positive Rate Across Tactics",
x = "pre- or post-treatment period",
y = "Tactic FPR") +
theme_classic() +
facet_wrap(. ~ type)grid.arrange(recall_by_tactic, fpr_by_tactic, ncol=1)If participants were just randomly selecting more checkboxes after treatment, we would see a drop in precision as there would be a much larger increase in the number of false positives. This, however, isn’t the case. This indicates that our participants are not just randomly selecting more checkboxes after treatment.
precision_by_tactic <- t.test(
df_attention %>% filter(treated == 1) %>% pull(MisGraph1_Pre_2_precision)
) %>% broom::tidy() %>% mutate(
period = "pre",
treated = "1",
type = "Misleading Graphs"
) %>%
rbind(
t.test(
df_attention %>% filter(treated == 0) %>% pull(MisGraph1_Pre_2_precision)
) %>% broom::tidy() %>% mutate(
period = "pre",
treated = "0",
type = "Misleading Graphs"
)
) %>%
rbind(
t.test(
df_attention %>% filter(treated == 1) %>% pull(MisGraph1_Post_2_precision)
) %>% broom::tidy() %>% mutate(
period = "post",
treated = "1",
type = "Misleading Graphs"
)
) %>%
rbind(
t.test(
df_attention %>% filter(treated == 0) %>% pull(MisGraph1_Post_2_precision)
) %>% broom::tidy() %>% mutate(
period = "post",
treated = "0",
type = "Misleading Graphs"
)
) %>%
rbind(
t.test(
df_attention %>% filter(treated == 1) %>% pull(FalseComp1_Pre_2_precision)
) %>% broom::tidy() %>% mutate(
period = "pre",
treated = "1",
type = "False Comparisons"
)
) %>%
rbind(
t.test(
df_attention %>% filter(treated == 0) %>% pull(FalseComp1_Pre_2_precision)
) %>% broom::tidy() %>% mutate(
period = "pre",
treated = "0",
type = "False Comparisons"
)
) %>%
rbind(
t.test(
df_attention %>% filter(treated == 1) %>% pull(FalseComp1_Post_2_precision)
) %>% broom::tidy() %>% mutate(
period = "post",
treated = "1",
type = "False Comparisons"
)
) %>%
rbind(
t.test(
df_attention %>% filter(treated == 0) %>% pull(FalseComp1_Post_2_precision)
) %>% broom::tidy() %>% mutate(
period = "post",
treated = "0",
type = "False Comparisons"
)
) %>%
rbind(
t.test(
df_attention %>% filter(treated == 1) %>% pull(Anecdotes1_Pre_2_precision)
) %>% broom::tidy() %>% mutate(
period = "pre",
treated = "1",
type = "Anecdotes"
)
) %>%
rbind(
t.test(
df_attention %>% filter(treated == 0) %>% pull(Anecdotes1_Pre_2_precision)
) %>% broom::tidy() %>% mutate(
period = "pre",
treated = "0",
type = "Anecdotes"
)
) %>%
rbind(
t.test(
df_attention %>% filter(treated == 1) %>% pull(Anecdotes1_Post_2_precision)
) %>% broom::tidy() %>% mutate(
period = "post",
treated = "1",
type = "Anecdotes"
)
) %>%
rbind(
t.test(
df_attention %>% filter(treated == 0) %>% pull(Anecdotes1_Post_2_precision)
) %>% broom::tidy() %>% mutate(
period = "post",
treated = "0",
type = "Anecdotes"
)
) %>%
mutate(period = factor(period, levels = c("pre", "post")),
treated = factor(treated, levels = c("1", "0"))) %>%
ggplot(aes(
period,
estimate,
ymin = conf.low,
ymax = conf.high,
color = treated
)) +
geom_pointrange(position = position_dodge(.1)) +
geom_line(aes(group = interaction(treated, type))) +
scale_color_grey() +
labs(
title = "Precision Across Tactics",
x = "pre- or post-treatment period",
y = "Tactic Precision") +
theme_classic() +
facet_wrap(. ~ type)
precision_by_tacticNumber of Tactics Selected
We see that the treated group selects significantly more checkboxes in the post-test.
t.test(df %>% filter(treated == 1) %>% pull(n_tactics_pre)) %>% broom::tidy() %>% mutate(period = "pre", treated = "1") %>%
rbind(
t.test(df %>% filter(treated == 0) %>% pull(n_tactics_pre)) %>% broom::tidy() %>% mutate(period = "pre", treated = "0")
) %>%
rbind(
t.test(df %>% filter(treated == 1) %>% pull(n_tactics_post)) %>% broom::tidy() %>% mutate(period = "post", treated = "1")
) %>%
rbind(
t.test(df %>% filter(treated == 0) %>% pull(n_tactics_post)) %>% broom::tidy() %>% mutate(period = "post", treated = "0")
) %>%
mutate(period = factor(period, levels = c("pre", "post")),
treated = factor(treated, levels = c("1", "0"))) %>%
ggplot(aes(
period,
estimate,
ymin = conf.low,
ymax = conf.high,
color = treated
)) +
geom_pointrange(position = position_dodge(.05)) +
geom_line(aes(group = treated)) +
scale_color_grey() +
labs(
title = "Number of Tactics Checkboxes Selected",
x = "pre- or post-treatment period",
y = "score"
) +
theme_classic()t.test(
df_attention %>% filter(treated == 1) %>% pull(n_tactics_delta),
df_attention %>% filter(treated == 0) %>% pull(n_tactics_delta)
) %>%
broom::tidy() %>%
mutate(p.value_adjusted = p.adjust(p.value, method = "BH", n = 10)) %>%
mutate_if(is.numeric, round, digits = 2) %>%
dplyr::select(-method, -alternative)## # A tibble: 1 × 9
## estimate estimate1 estimate2 statistic p.value parameter conf.low conf.high
## <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 1.63 2.14 0.5 6.84 0 576. 1.16 2.1
## # … with 1 more variable: p.value_adjusted <dbl>
The increase in checkboxes selected seems to focus on the tactics that we taught the participants in the tactics course. The increase in the placebo tactic that we included was not significant. This indicates that the participants did recognize the tactics that we taught them to an extent.
t.test(df %>% filter(treated == 1) %>% pull(n_misgraph_pre)) %>% broom::tidy() %>% mutate(period = "pre", treated = "1", type = "Misleading Graphs") %>%
rbind(
t.test(df %>% filter(treated == 0) %>% pull(n_misgraph_pre)) %>% broom::tidy() %>% mutate(period = "pre", treated = "0", type = "Misleading Graphs")
) %>%
rbind(
t.test(df %>% filter(treated == 1) %>% pull(n_misgraph_post)) %>% broom::tidy() %>% mutate(period = "post", treated = "1", type = "Misleading Graphs")
) %>%
rbind(
t.test(df %>% filter(treated == 0) %>% pull(n_misgraph_post)) %>% broom::tidy() %>% mutate(period = "post", treated = "0", type = "Misleading Graphs")
) %>%
rbind(
t.test(df %>% filter(treated == 1) %>% pull(n_falsecomp_pre)) %>% broom::tidy() %>% mutate(period = "pre", treated = "1", type = "False Comparisons")
) %>%
rbind(
t.test(df %>% filter(treated == 0) %>% pull(n_falsecomp_pre)) %>% broom::tidy() %>% mutate(period = "pre", treated = "0", type = "False Comparisons")
) %>%
rbind(
t.test(df %>% filter(treated == 1) %>% pull(n_falsecomp_post)) %>% broom::tidy() %>% mutate(period = "post", treated = "1", type = "False Comparisons")
) %>%
rbind(
t.test(df %>% filter(treated == 0) %>% pull(n_falsecomp_post)) %>% broom::tidy() %>% mutate(period = "post", treated = "0", type = "False Comparisons")
) %>%
rbind(
t.test(df %>% filter(treated == 1) %>% pull(n_anecdote_pre)) %>% broom::tidy() %>% mutate(period = "pre", treated = "1", type = "Anecdotes")
) %>%
rbind(
t.test(df %>% filter(treated == 0) %>% pull(n_anecdote_pre)) %>% broom::tidy() %>% mutate(period = "pre", treated = "0", type = "Anecdotes")
) %>%
rbind(
t.test(df %>% filter(treated == 1) %>% pull(n_anecdote_post)) %>% broom::tidy() %>% mutate(period = "post", treated = "1", type = "Anecdotes")
) %>%
rbind(
t.test(df %>% filter(treated == 0) %>% pull(n_anecdote_post)) %>% broom::tidy() %>% mutate(period = "post", treated = "0", type = "Anecdotes")
) %>%
rbind(
t.test(df %>% filter(treated == 1) %>% pull(n_evidence_pre)) %>% broom::tidy() %>% mutate(period = "pre", treated = "1", type = "Evidence out of Context")
) %>%
rbind(
t.test(df %>% filter(treated == 0) %>% pull(n_evidence_pre)) %>% broom::tidy() %>% mutate(period = "pre", treated = "0", type = "Evidence out of Context")
) %>%
rbind(
t.test(df %>% filter(treated == 1) %>% pull(n_evidence_post)) %>% broom::tidy() %>% mutate(period = "post", treated = "1", type = "Evidence out of Context")
) %>%
rbind(
t.test(df %>% filter(treated == 0) %>% pull(n_evidence_post)) %>% broom::tidy() %>% mutate(period = "post", treated = "0", type = "Evidence out of Context")
) %>%
mutate(period = factor(period, levels = c("pre", "post")),
treated = factor(treated, levels = c("1", "0"))) %>%
ggplot(aes(
period,
estimate,
ymin = conf.low,
ymax = conf.high,
color = treated
)) +
geom_pointrange(position = position_dodge(.05)) +
geom_line(aes(group = treated)) +
scale_color_grey() +
labs(
title = "Number of Tactics Checkboxes Selected",
subtitle = "By Tactic",
x = "pre- or post-treatment period",
y = "score"
) +
theme_classic() +
facet_wrap(.~type)t.test(
df_attention %>% filter(treated == 1) %>% pull(n_evidence_delta),
df_attention %>% filter(treated == 0) %>% pull(n_evidence_delta),
conf.level = conf_level
) %>%
broom::tidy() %>%
mutate(p.value_adjusted = p.adjust(p.value, method = "BH", n = 10)) %>%
mutate_if(is.numeric, round, digits = 2) %>%
dplyr::select(-method, -alternative)## # A tibble: 1 × 9
## estimate estimate1 estimate2 statistic p.value parameter conf.low conf.high
## <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 0.12 0.36 0.24 1.03 0.3 494. -0.22 0.45
## # … with 1 more variable: p.value_adjusted <dbl>
As reference, we also checked if there is a significant increase in selecting all 3 tactics for each question.
In general, it looks like this behavior only occurred for the True Graph question (which makes some sense given it contained true anecdotes and comparisons). The absence of this behavior in the other questions suggests that participants are being somewhat deliberate in the tactics they select for each question.
pre_post_individual %>%
dplyr::select(
treated,
MisGraph1_Pre_2_selectall,
FalseComp1_Pre_2_selectall,
Anecdotes1_Pre_2_selectall,
Combined1_Pre_2_selectall,
MisGraph1_Post_2_selectall,
FalseComp1_Post_2_selectall,
Anecdotes1_Post_2_selectall,
Combined1_Post_2_selectall,
True1_Pre_2_selectall,
True2_Pre_2_selectall,
True1_Post_2_selectall,
True2_Post_2_selectall
) %>%
filter(treated == 1) %>%
select_if(is.numeric) %>%
map_df(~ broom::tidy(t.test(.)), .id = 'var') %>%
mutate(treated = "1") %>%
rbind(
pre_post_individual %>%
dplyr::select(
treated,
MisGraph1_Pre_2_selectall,
FalseComp1_Pre_2_selectall,
Anecdotes1_Pre_2_selectall,
Combined1_Pre_2_selectall,
MisGraph1_Post_2_selectall,
FalseComp1_Post_2_selectall,
Anecdotes1_Post_2_selectall,
Combined1_Post_2_selectall,
True1_Pre_2_selectall,
True2_Pre_2_selectall,
True1_Post_2_selectall,
True2_Post_2_selectall
) %>%
filter(treated == 0) %>%
select_if(is.numeric) %>%
map_df(~ broom::tidy(t.test(.)), .id = 'var') %>%
mutate(treated = "0")
) %>%
separate(var, into = c("type", "period")) %>%
mutate(period = factor(period, levels = c("Pre", "Post")),
treated = factor(treated, levels = c("1", "0"))) %>%
ggplot(aes(
period,
estimate,
ymin = conf.low,
ymax = conf.high,
color = treated
)) +
geom_pointrange(position = position_dodge(.1)) +
geom_line(aes(group = interaction(treated, type))) +
scale_color_grey() +
labs(
title = "Select All 3 Tactics Per Question",
x = "pre- or post-treatment period",
y = "Average score"
) +
theme_classic() +
facet_wrap(. ~ type)Unpenalized Tactics Identification
The previous metrics all penalized the user for selecting too many checkboxes. However, we did not explicitly tell the participant that they would be penalized for selecting extra tactics in the treatment or question prompt. We therefore considered an alternative to our analysis without such a penalty.
Specifically, for each post, we gave the participant a score of 1 if they recognized that the post was not misleading or if they at least selected the correct misleading tactic for the post (without penalty for selecting other tactics). Otherwise, they scored a 0.
The overall performance is very noisy for this metric, so we broke it down by question.
t.test(df %>% filter(treated == 1) %>% pull(tactics_pre_2_binary)) %>% broom::tidy() %>% mutate(period = "pre", treated = "1") %>%
rbind(
t.test(df %>% filter(treated == 0) %>% pull(tactics_pre_2_binary)) %>% broom::tidy() %>% mutate(period = "pre", treated = "0")
) %>%
rbind(
t.test(df %>% filter(treated == 1) %>% pull(tactics_post_2_binary)) %>% broom::tidy() %>% mutate(period = "post", treated = "1")
) %>%
rbind(
t.test(df %>% filter(treated == 0) %>% pull(tactics_post_2_binary)) %>% broom::tidy() %>% mutate(period = "post", treated = "0")
) %>%
mutate(period = factor(period, levels = c("pre", "post")),
treated = factor(treated, levels = c("1", "0"))) %>%
ggplot(aes(
period,
estimate,
ymin = conf.low,
ymax = conf.high,
color = treated
)) +
geom_pointrange(position = position_dodge(.05)) +
geom_line(aes(group = treated)) +
scale_color_grey() +
labs(title = "Tactics Identification by Question (Binary without False Positive Penalty)",
# subtitle = "1 if true and no tactic is selected
# 1 if misleading and correct tactic selected (no FP penalty)
# 0 otherwise
# ",
x = "pre- or post-treatment period",
y = "score") +
theme_classic()Using the question breakdown, there is a clear increase in performance based on this metric for all the manipulative posts and a clear drop in performance for the true posts. This suggests that the noise in the aggregated graph (above) is due to the drop in performance from true posts offsetting the increase in performance from the manipulative posts.
pre_post_individual %>%
dplyr::select(
treated,
MisGraph1_Pre_2_binary,
FalseComp1_Pre_2_binary,
Anecdotes1_Pre_2_binary,
Combined1_Pre_2_binary,
MisGraph1_Post_2_binary,
FalseComp1_Post_2_binary,
Anecdotes1_Post_2_binary,
Combined1_Post_2_binary,
True1_Pre_2_binary,
True2_Pre_2_binary,
True1_Post_2_binary,
True2_Post_2_binary
) %>%
filter(treated == 1) %>%
select_if(is.numeric) %>%
map_df(~ broom::tidy(t.test(.)), .id = 'var') %>%
mutate(treated = "1") %>%
rbind(
pre_post_individual %>%
dplyr::select(
treated,
MisGraph1_Pre_2_binary,
FalseComp1_Pre_2_binary,
Anecdotes1_Pre_2_binary,
Combined1_Pre_2_binary,
MisGraph1_Post_2_binary,
FalseComp1_Post_2_binary,
Anecdotes1_Post_2_binary,
Combined1_Post_2_binary,
True1_Pre_2_binary,
True2_Pre_2_binary,
True1_Post_2_binary,
True2_Post_2_binary
) %>%
filter(treated == 0) %>%
select_if(is.numeric) %>%
map_df(~ broom::tidy(t.test(.)), .id = 'var') %>%
mutate(treated = "0")
) %>%
separate(var, into = c("type", "period")) %>%
mutate(., type = c('MisGraph1' = 'Misleading Graph',
'Anecdotes1' = 'Anecdotes',
'FalseComp1' = 'False Comparisons',
'True1' = 'True (Graph)',
'True2' = 'True (No Graph)',
'Combined1' = 'Combined',
none = 'N')[type]) %>%
mutate(period = factor(period, levels = c("Pre", "Post")),
treated = factor(treated, levels = c("1", "0"))) %>%
ggplot(aes(
period,
estimate,
ymin = conf.low,
ymax = conf.high,
color = treated
)) +
geom_pointrange(position = position_dodge(.1)) +
geom_line(aes(group = interaction(treated, type))) +
scale_color_grey() +
labs(
title = "Tactics Identification by Question (Binary without False Positive Penalty)",
# subtitle = "Binarized (correct if the tactic is selected)",
x = "pre- or post-treatment period",
y = "Average score"
) +
theme_classic() +
facet_wrap(. ~ type)t.test(
df_attention %>% filter(treated == 1) %>% pull(tactics_delta_2_binary),
df_attention %>% filter(treated == 0) %>% pull(tactics_delta_2_binary),
conf.level = conf_level
) %>%
broom::tidy() %>%
# mutate(p.value_adjusted = p.adjust(p.value, method = "BH", n = 10)) %>%
mutate_if(is.numeric, round, digits = 2) %>%
dplyr::select(-method, -alternative)## # A tibble: 1 × 8
## estimate estimate1 estimate2 statistic p.value parameter conf.low conf.high
## <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 0.13 0.43 0.3 1.02 0.31 442. -0.24 0.5
t.test(
df_attention %>% filter(treated == 1) %>% pull(MisGraph1_Delta_2),
df_attention %>% filter(treated == 0) %>% pull(MisGraph1_Delta_2),
conf.level = conf_level
) %>%
broom::tidy() %>%
# mutate(p.value_adjusted = p.adjust(p.value, method = "BH", n = 10)) %>%
mutate_if(is.numeric, round, digits = 2) %>%
dplyr::select(-method, -alternative)## # A tibble: 1 × 8
## estimate estimate1 estimate2 statistic p.value parameter conf.low conf.high
## <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 0.19 0.26 0.07 3.46 0 434. 0.03 0.34
t.test(
df_attention %>% filter(treated == 1) %>% pull(Anecdotes1_Delta_2),
df_attention %>% filter(treated == 0) %>% pull(Anecdotes1_Delta_2),
conf.level = conf_level
) %>%
broom::tidy() %>%
# mutate(p.value_adjusted = p.adjust(p.value, method = "BH", n = 10)) %>%
mutate_if(is.numeric, round, digits = 2) %>%
dplyr::select(-method, -alternative)## # A tibble: 1 × 8
## estimate estimate1 estimate2 statistic p.value parameter conf.low conf.high
## <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 0.17 0.19 0.02 3.24 0 433. 0.02 0.32
t.test(
df_attention %>% filter(treated == 1) %>% pull(FalseComp1_Delta_2),
df_attention %>% filter(treated == 0) %>% pull(FalseComp1_Delta_2),
conf.level = conf_level
) %>%
broom::tidy() %>%
# mutate(p.value_adjusted = p.adjust(p.value, method = "BH", n = 10)) %>%
mutate_if(is.numeric, round, digits = 2) %>%
dplyr::select(-method, -alternative)## # A tibble: 1 × 8
## estimate estimate1 estimate2 statistic p.value parameter conf.low conf.high
## <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 0.12 0.13 0.01 2.34 0.02 464. -0.03 0.28
Summary of tactics results for each hypothesis
H2 - Misleading Graphs Tactic
H2 is that: participants will be more capable of identifying misleading graphs after taking the course.
Accuracy drops for misleading graphs, but is not significant (although it is at the threshold).
# Tactics Accuracy
t.test(
df_attention %>% filter(treated == 1) %>% pull(MisGraph1_Delta_2_accuracy),
df_attention %>% filter(treated == 0) %>% pull(MisGraph1_Delta_2_accuracy),
conf.level = conf_level
) %>%
broom::tidy() %>%
# mutate(p.value_adjusted = p.adjust(p.value, method = "BH", n = 10)) %>%
mutate_if(is.numeric, round, digits = 2) %>%
dplyr::select(-method,-alternative)## # A tibble: 1 × 8
## estimate estimate1 estimate2 statistic p.value parameter conf.low conf.high
## <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 -0.03 -0.02 0.01 -1.75 0.08 447. -0.08 0.02
The true positive rate increases dramatically for misleading graphs. This change is statistically significant before and after multiple hypothesis corrections.
# Tactics Recall
t.test(
df_attention %>% filter(treated == 1) %>% pull(MisGraph1_Delta_2_recall),
df_attention %>% filter(treated == 0) %>% pull(MisGraph1_Delta_2_recall),
conf.level = conf_level
) %>%
broom::tidy() %>%
mutate(p.value_adjusted = p.adjust(p.value, method = "BH", n = 10)) %>%
mutate_if(is.numeric, round, digits = 2) %>%
dplyr::select(-method,-alternative)## # A tibble: 1 × 9
## estimate estimate1 estimate2 statistic p.value parameter conf.low conf.high
## <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 0.19 0.26 0.07 3.46 0 434. 0.03 0.34
## # … with 1 more variable: p.value_adjusted <dbl>
The false positive rate also increases dramatically for misleading graphs. This is also statistically significant before and after multiple hypothesis corrections.
# Tactics FPR
t.test(
df_attention %>% filter(treated == 1) %>% pull(MisGraph1_Delta_2_fpr),
df_attention %>% filter(treated == 0) %>% pull(MisGraph1_Delta_2_fpr),
conf.level = conf_level
) %>%
broom::tidy() %>%
mutate(p.value_adjusted = p.adjust(p.value, method = "BH", n = 10)) %>%
mutate_if(is.numeric, round, digits = 2) %>%
dplyr::select(-method,-alternative)## # A tibble: 1 × 9
## estimate estimate1 estimate2 statistic p.value parameter conf.low conf.high
## <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 0.07 0.08 0.01 4.05 0 462. 0.02 0.12
## # … with 1 more variable: p.value_adjusted <dbl>
Precision drops for misleading graphs, but this change is not statistically significant.
# Tactics Precision
t.test(
df_attention %>% filter(treated == 1) %>% pull(MisGraph1_Delta_2_precision),
df_attention %>% filter(treated == 0) %>% pull(MisGraph1_Delta_2_precision),
conf.level = conf_level
) %>%
broom::tidy() %>%
mutate(p.value_adjusted = p.adjust(p.value, method = "BH", n = 10)) %>%
mutate_if(is.numeric, round, digits = 2) %>%
dplyr::select(-method,-alternative)## # A tibble: 1 × 9
## estimate estimate1 estimate2 statistic p.value parameter conf.low conf.high
## <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 -0.06 -0.02 0.04 -0.93 0.36 167. -0.24 0.12
## # … with 1 more variable: p.value_adjusted <dbl>
# Binary version
# t.test(
# df_attention %>% filter(treated == 1) %>% pull(MisGraph1_Delta_2),
# df_attention %>% filter(treated == 0) %>% pull(MisGraph1_Delta_2)
# ) %>%
# broom::tidy() %>%
# mutate(p.value_adjusted = p.adjust(p.value, method = "BH", n = 10)) %>%
# mutate_if(is.numeric, round, digits = 2) %>%
# select(-method,-alternative)H3 - Anecdotes Tactic
H3 is that: participants will be more capable of identifying anecdotes after taking the course.
Across the questions, we see the same drop in accuracy of identifying the anecdote tactic correctly. However, this drop is not statistically significant.
# Tactics Accuracy
t.test(
df_attention %>% filter(treated == 1) %>% pull(Anecdotes1_Delta_2_accuracy),
df_attention %>% filter(treated == 0) %>% pull(Anecdotes1_Delta_2_accuracy),
conf.level = conf_level
) %>%
broom::tidy() %>%
mutate(p.value_adjusted = p.adjust(p.value, method = "BH", n = 10)) %>%
mutate_if(is.numeric, round, digits = 2) %>%
dplyr::select(-method, -alternative)## # A tibble: 1 × 9
## estimate estimate1 estimate2 statistic p.value parameter conf.low conf.high
## <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 -0.02 -0.01 0.01 -1.16 0.25 446. -0.08 0.03
## # … with 1 more variable: p.value_adjusted <dbl>
The true positive rate increases for anecdotes. But this change is only statistically significant before multiple hypothesis corrections.
# Tactics Recall
t.test(
df_attention %>% filter(treated == 1) %>% pull(Anecdotes1_Delta_2_recall),
df_attention %>% filter(treated == 0) %>% pull(Anecdotes1_Delta_2_recall),
conf.level = conf_level
) %>%
broom::tidy() %>%
mutate(p.value_adjusted = p.adjust(p.value, method = "BH", n = 10)) %>%
mutate_if(is.numeric, round, digits = 2) %>%
dplyr::select(-method,-alternative)## # A tibble: 1 × 9
## estimate estimate1 estimate2 statistic p.value parameter conf.low conf.high
## <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 0.09 0.14 0.05 2.44 0.02 444. -0.02 0.19
## # … with 1 more variable: p.value_adjusted <dbl>
The false positive rate increases for misleading anecdotes. This change is statistically significant before and after multiple hypothesis corrections.
# Tactics FPR
t.test(
df_attention %>% filter(treated == 1) %>% pull(Anecdotes1_Delta_2_fpr),
df_attention %>% filter(treated == 0) %>% pull(Anecdotes1_Delta_2_fpr),
conf.level = conf_level
) %>%
broom::tidy() %>%
mutate(p.value_adjusted = p.adjust(p.value, method = "BH", n = 10)) %>%
mutate_if(is.numeric, round, digits = 2) %>%
dplyr::select(-method,-alternative)## # A tibble: 1 × 9
## estimate estimate1 estimate2 statistic p.value parameter conf.low conf.high
## <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 0.08 0.08 0 3.38 0 506. 0.01 0.14
## # … with 1 more variable: p.value_adjusted <dbl>
Precision drops slightly for anecdotes, but this change is not statistically significant.
# Tactics Precision
t.test(
df_attention %>% filter(treated == 1) %>% pull(Anecdotes1_Delta_2_precision),
df_attention %>% filter(treated == 0) %>% pull(Anecdotes1_Delta_2_precision),
conf.level = conf_level
) %>%
broom::tidy() %>%
mutate(p.value_adjusted = p.adjust(p.value, method = "BH", n = 10)) %>%
mutate_if(is.numeric, round, digits = 2) %>%
dplyr::select(-method,-alternative)## # A tibble: 1 × 9
## estimate estimate1 estimate2 statistic p.value parameter conf.low conf.high
## <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 -0.01 0.01 0.02 -0.17 0.87 233. -0.16 0.14
## # … with 1 more variable: p.value_adjusted <dbl>
# Binary version
# t.test(
# df_attention %>% filter(treated == 1) %>% pull(Anecdotes1_Delta_2),
# df_attention %>% filter(treated == 0) %>% pull(Anecdotes1_Delta_2)
# ) %>%
# broom::tidy() %>%
# mutate(p.value_adjusted = p.adjust(p.value, method = "BH", n = 10)) %>%
# mutate_if(is.numeric, round, digits = 2) %>%
# select(-method, -alternative)H4 - False Comparisons Tactic
H4 is that: participants will be more capable of identifying false comparisons after taking the course.
Across the questions, we see the same drop in accuracy of identifying the false comparison tactic correctly. This drop is only statistically significant before multiple hypothesis correction.
# Tactics Accuracy
t.test(
df_attention %>% filter(treated == 1) %>% pull(FalseComp1_Delta_2_accuracy),
df_attention %>% filter(treated == 0) %>% pull(FalseComp1_Delta_2_accuracy),
conf.level = conf_level
) %>%
broom::tidy() %>%
mutate(p.value_adjusted = p.adjust(p.value, method = "BH", n = 10)) %>%
mutate_if(is.numeric, round, digits = 2) %>%
dplyr::select(-method, -alternative)## # A tibble: 1 × 9
## estimate estimate1 estimate2 statistic p.value parameter conf.low conf.high
## <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 -0.05 -0.02 0.03 -2.44 0.02 448. -0.1 0.01
## # … with 1 more variable: p.value_adjusted <dbl>
The true positive rate increases for false comparisons. This change is not statistically significant.
# Tactics Recall
t.test(
df_attention %>% filter(treated == 1) %>% pull(FalseComp1_Delta_2_recall),
df_attention %>% filter(treated == 0) %>% pull(FalseComp1_Delta_2_recall),
conf.level = conf_level
) %>%
broom::tidy() %>%
mutate(p.value_adjusted = p.adjust(p.value, method = "BH", n = 10)) %>%
mutate_if(is.numeric, round, digits = 2) %>%
dplyr::select(-method,-alternative)## # A tibble: 1 × 9
## estimate estimate1 estimate2 statistic p.value parameter conf.low conf.high
## <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 0.06 0.11 0.05 1.53 0.13 464. -0.05 0.16
## # … with 1 more variable: p.value_adjusted <dbl>
The false positive rate increases for false comparisons. This change is statistically significant before and after corrections.
# Tactics FPR
t.test(
df_attention %>% filter(treated == 1) %>% pull(FalseComp1_Delta_2_fpr),
df_attention %>% filter(treated == 0) %>% pull(FalseComp1_Delta_2_fpr),
conf.level = conf_level
) %>%
broom::tidy() %>%
mutate(p.value_adjusted = p.adjust(p.value, method = "BH", n = 10)) %>%
mutate_if(is.numeric, round, digits = 2) %>%
dplyr::select(-method,-alternative)## # A tibble: 1 × 9
## estimate estimate1 estimate2 statistic p.value parameter conf.low conf.high
## <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 0.1 0.08 -0.02 4.39 0 483. 0.03 0.16
## # … with 1 more variable: p.value_adjusted <dbl>
Precision drops for false comparisons, but this change is not statistically significant.
# Tactics Precision
t.test(
df_attention %>% filter(treated == 1) %>% pull(FalseComp1_Delta_2_precision),
df_attention %>% filter(treated == 0) %>% pull(FalseComp1_Delta_2_precision),
conf.level = conf_level
) %>%
broom::tidy() %>%
mutate(p.value_adjusted = p.adjust(p.value, method = "BH", n = 10)) %>%
mutate_if(is.numeric, round, digits = 2) %>%
dplyr::select(-method,-alternative)## # A tibble: 1 × 9
## estimate estimate1 estimate2 statistic p.value parameter conf.low conf.high
## <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 -0.08 -0.01 0.07 -1.57 0.12 262. -0.22 0.07
## # … with 1 more variable: p.value_adjusted <dbl>
# Binary version
# t.test(
# df_attention %>% filter(treated == 1) %>% pull(FalseComp1_Delta_2),
# df_attention %>% filter(treated == 0) %>% pull(FalseComp1_Delta_2)
# ) %>%
# broom::tidy() %>%
# mutate(p.value_adjusted = p.adjust(p.value, method = "BH", n = 10)) %>%
# mutate_if(is.numeric, round, digits = 2) %>%
# select(-method, -alternative)The above mentioned results can be summarized below
Heterogeneous Treatment Effects (HTE)
demographic_colnames <-
c(
'Gender',
'Race',
'Education',
'Income',
'Ideology',
'GeneralTrust',
'KnowFauci',
'TrustFauci',
'TrustScientists',
'NewsSources',
'PostFrequency',
'BlockUserSocialMedia',
'ReportUserSocialMedia',
'SeenManipulative',
'KnowSpot'
)We first convert the categorical covariates into a numeric scale.
HTE_data <-
df_attention %>% mutate(Gender_numeric = case_when(Gender == 'Male' ~ 1,
Gender == 'Female' ~ 2,
TRUE ~ 3))
HTE_data <-
HTE_data %>% mutate(White = ifelse(Race == 'White', 1, 0))
group1_educ = c("Did not graduate from high school", "High school degree")
group2_educ = c("Associate degree", "Bachelor's degree", "Some collage")
HTE_data <- HTE_data %>%
mutate(
education_level_2 = case_when(
Education %in% group1_educ ~ 1,
Education %in% group2_educ ~ 2,
TRUE ~ 3
)
)
HTE_data <-
HTE_data %>% mutate(GeneralTrust_numeric = as.numeric(GeneralTrust))
HTE_data <-
HTE_data %>% mutate(FauciTrust_numeric = as.numeric(TrustFauci))
HTE_data <-
HTE_data %>% mutate(SciTrust_numeric = as.numeric(TrustScientists))
HTE_data <- HTE_data %>%
mutate(
PostFrequency_numeric = case_when(
PostFrequency == "Many times a day" ~ 5,
PostFrequency == "Daily" ~ 4,
PostFrequency == "Weekly" ~ 3,
PostFrequency == "Less than monthly" ~ 2,
PostFrequency == "Monthly" ~ 1,
PostFrequency == "I don't use social media" ~ 0
)
)
HTE_data <- HTE_data %>% mutate(
BlockUserSocialMedia_numeric = case_when(
BlockUserSocialMedia == 'Yes' ~ 3,
BlockUserSocialMedia == 'No' ~ 2,
TRUE ~ 1
)
)
HTE_data <- HTE_data %>% mutate(
ReportUserSocialMedia_numeric = case_when(
ReportUserSocialMedia == 'Yes' ~ 3,
ReportUserSocialMedia == 'No' ~ 2,
TRUE ~ 1
)
)
n <- nrow(HTE_data)Summary Results - HTE
Heterogeneity in treatment effects are both a risk and opportunity of intervention studies. On one hand, heterogeneity can be good for identifying groups of people for whom the treatment will be most effective so one can more efficiently target interventions. On the other hand, heterogeneity can also mean that your intervention is not effective for a population of people. For example, if the intervention is more effective among political liberal versus conservative individuals, it may be good that one can target liberals for more efficient outcomes, but it can also be bad that the intervention does not improve the ability of conservative individuals to identify misinformation.
However, we see little heterogeneity in the treatment effects. In fact, we only find heterogeneity across different incomes. Again, on one hand, this lack of heterogeneity may not be desirable because it provides a less precise answer to who to target with the intervention, but it is also good because it indicates that the treatment is likely appropriate for a large audience.
H7 - Susceptibility to Misinformation
H7 is that: participants with different levels of susceptibility to misinformation at baseline will react differently to the treatment in terms of their overall ability to identify manipulative content.
We investigated this hypothesis using three measures of misinformation susceptibility from our survey:
- Participant performance on the pre-test
(
pre_score) - Participants who self-identified as knowing how to spot manipulative
techniques used in articles or headlines (
KnowSpot) - Participants who self-identified as having seen a manipulative news
article or headline (
SeenManipulative)
The results of HTE analysis based on these three variables are shown below. None of the effects are significant.
The results based on pre-score may indicate that our
test questions were sufficiently neutral to remove any effect. Indeed,
our test questions were deliberately designed not to be politically
polarizing. Some prior literature has found HTE based on misinformation
susceptibility using tailored indices (e.g. Maertens et al. (2021),
Bruder et al. (2013)). Although these indices were typically based on
politically-charged examples.
Our results based on KnowSpot and
SeenManipulative may indicate that participants struggle to
self-identify their own susceptibility. This has also been shown
elsewhere (Rapp & Salovich (2018)).
# pre_score
HTE_data <- HTE_data %>% mutate(
pre_score = True1_Pre_1 + True2_Pre_1 + MisGraph1_Pre_1 + FalseComp1_Pre_1 + Anecdotes1_Pre_1 + Combined1_Pre_1
)
model_2 <- lm(mean_Delta_1_False ~ treated * pre_score, HTE_data)
results_2 <- coeftest(model_2, vcov = vcovHC(model_2, type = "HC1"), level = conf_level)
results_2##
## t test of coefficients:
##
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2.677359 0.395764 6.7650 3.023e-11 ***
## treated1 0.363508 0.485522 0.7487 0.4543
## pre_score -0.108375 0.016231 -6.6771 5.310e-11 ***
## treated1:pre_score 0.010818 0.019702 0.5491 0.5832
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# KnowSpot
HTE_data <- HTE_data %>%
mutate(KnowSpot_numeric = case_when(KnowSpot == "No" ~ 1, # 124
KnowSpot == "Not sure" ~ 2, # 196
KnowSpot == "Yes" ~ 3 # 321
))
model_2 <-
lm(mean_Delta_1_False ~ treated * KnowSpot_numeric, HTE_data)
results_2 <-
coeftest(model_2, vcov = vcovHC(model_2, type = "HC1"), level = conf_level)
results_2##
## t test of coefficients:
##
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.0003828 0.2280820 0.0017 0.998661
## treated1 0.7813068 0.2922862 2.6731 0.007708 **
## KnowSpot_numeric 0.0497286 0.0948265 0.5244 0.600171
## treated1:KnowSpot_numeric -0.0678871 0.1213787 -0.5593 0.576153
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# SeenManipulative
HTE_data <- HTE_data %>%
mutate(
SeenManipulative_numeric = case_when(
SeenManipulative == "No" ~ 1,
# 77
SeenManipulative == "Not sure" ~ 2,
# 158
SeenManipulative == "Yes" ~ 3 # 406
)
)
model_2 <-
lm(mean_Delta_1_False ~ treated * SeenManipulative_numeric,
HTE_data)
results_2 <- coeftest(model_2, vcov = vcovHC(model_2, type = "HC1"), level = conf_level)
results_2##
## t test of coefficients:
##
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.470230 0.309848 1.5176 0.1296
## treated1 0.098086 0.387452 0.2532 0.8002
## SeenManipulative_numeric -0.145762 0.117916 -1.2362 0.2169
## treated1:SeenManipulative_numeric 0.212992 0.146811 1.4508 0.1473
H8 - Political Ideology
H8 is that: participants with different political ideologies will react differently to the treatment in terms of their overall ability to identify manipulative content.
We investigated this hypothesis using a question that asked participants to self-identify as one of “very liberal,” “moderately liberal,” “moderate,” “moderately conservative,” and “very conservative.” The results, shown below, indicate that there were no significant HTE. This likely reflects the fact that we deliberately chose post examples to test misinformation that wouldn’t be politically polarizing, and we used the non-polarized term manipulative information instead of misinformation.
HTE_data <- HTE_data %>%
mutate(
Ideology_numeric = case_when(
Ideology == "Very liberal" ~ -2,
Ideology == "Moderately liberal" ~ -1,
Ideology == "Moderate" ~ 0,
Ideology == "Moderately conservative" ~ 1,
Ideology == "Very conservative" ~ 2
)
)
model_2 <-
lm(mean_Delta_1_False ~ treated * Ideology_numeric, HTE_data)
results_2 <- coeftest(model_2, vcov = vcovHC(model_2, type = "HC1"), level = conf_level)
results_2##
## t test of coefficients:
##
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.114960 0.077616 1.4811 0.1391
## treated1 0.625769 0.095728 6.5369 1.288e-10 ***
## Ideology_numeric -0.027031 0.061905 -0.4367 0.6625
## treated1:Ideology_numeric -0.040605 0.077109 -0.5266 0.5987
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Possibly, if we used more politically polarizing content and/ or treatments, we would have found heterogeneity by ideology. However, in the test that we ran, we did not find significant level of heterogeneity by ideology.
H9 - Income
H9 is that: participants with different levels of income will react differently to the treatment in terms of their overall ability to identify manipulative content.
We investigated this hypothesis using a question that asked participants to indicate which range best described their income:
- less than $25,000
- $25,000-$49,999
- $50,000-$74,999
- $75,000-$99,999
- $100,000-$149,999
- $150,000 or more.
We turned these categorical income values into the best possible continuous values we could come up with which was taking the mid points of the buckets. For the group with incomes above $150,000 we used a truncation of $200,000 and took the midpoint to be $175,000. Then we ran a regression interacting the newly created continuous income variable with the treatment. The results of this are below and show that our treatment was more effective for the lower income individuals. We find that on average for each $10,000 increase in income, the effect size of the treatment is reduced by 0.04 from the 0.8 effect size at an income of zero. This effect is statistically significant at the 10% level.
This result is in line with prior literature (Pan, Liu & Fang (2021), Kricorian, Civen & Equils (2021)). It likely reflects the fact that higher incomes tend to already be associated with higher levels of education and verbal reasoning (Bjälkebring & Peters (2021)).
HTE_data <- HTE_data %>%
mutate(
Income_numeric = case_when(
Income == "Less than $25,000" ~ 1,
# 172
Income == "$25,000 to $49,999" ~ 2,
# 192
Income == "$50,000 to $74,999" ~ 3,
# 198
Income == "$75,000 to $99,999" ~ 3,
# 198
Income == "$100,000 to $149,999" ~ 4,
# 78
Income == "$150,000 or more" ~ 4 # 78
)
)
HTE_data <- HTE_data %>%
mutate(
Income_numeric_cont = case_when(
Income == "Less than $25,000" ~ 12500,
# 172
Income == "$25,000 to $49,999" ~ 37500,
# 192
Income == "$50,000 to $74,999" ~ 62500,
# 198
Income == "$75,000 to $99,999" ~ 87500,
# 198
Income == "$100,000 to $149,999" ~ 125000,
# 78
Income == "$150,000 or more" ~ 175000
)
)
model_2 <-
lm(mean_Delta_1_False ~ treated * Income_numeric_cont, HTE_data)
results_2 <- coeftest(model_2, vcov = vcovHC(model_2, type = "HC1"), level = conf_level)
results_2##
## t test of coefficients:
##
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -1.2282e-02 1.3175e-01 -0.0932 0.92576
## treated1 8.6146e-01 1.6044e-01 5.3693 1.109e-07 ***
## Income_numeric_cont 2.3196e-06 1.8684e-06 1.2415 0.21487
## treated1:Income_numeric_cont -4.3378e-06 2.3776e-06 -1.8245 0.06855 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Next, we repeated this analysis keeping the original income buckets and estimating the interaction terms between the indicators for each income bucket and the treatment. The results of this analysis is shown in the graph below. This graph shows too that the treatment is most effective for the lowest income individuals. For the two lowest income brackets there is a statistically significant difference between the control and treatment groups, the treatment being effective in helping participants detect misinformation. For the higher income groups the improvement in the treated groups is not statistically significant compared to their respective control groups.
Income_level = c("Less than $25,000", "$25,000 to $49,999", "$50,000 to $74,999", "$75,000 to $99,999", "$100,000 to $149,999", "$150,000 or more")
HTE_data = HTE_data %>% mutate(income_level_cont = as.factor(Income_numeric_cont))
levels(HTE_data$income_level_cont) = Income_level
means_accuracy_ <- HTE_data %>%
# select our variables of interest
dplyr::select(treated, income_level_cont, mean_Delta_1_False) %>%
# group by treatment and income
group_by(treated, income_level_cont) %>%
summarize(mean = mean(mean_Delta_1_False),
sd = sd(mean_Delta_1_False),
n=n(),
se = sd/sqrt(n))means_accuracy_ %>%
ggplot(aes(income_level_cont, mean, group = treated, color = treated)) +
geom_errorbar(aes(ymin = mean - 1.96*se, ymax = mean + 1.96*se,
linetype = income_level_cont),
width = .1,
position=position_dodge(width = .4), show.legend = TRUE) +
geom_point(aes(fill=treated), size =3, shape = c(20, 20, 21, 21, 22, 22, 23, 23, 24, 24, 25, 25),
position=position_dodge2(width= .4)) +
scale_shape_manual(name = "Missing values", values=c(20, 20, 21, 21, 22, 22, 23, 23, 24, 24, 25, 25)) +
scale_y_continuous(breaks = seq(-1, 1.7 , 0.2),
labels = c(--1, -0.8, 0.6, -0.4, -0.2, 0, 0.2, 0.4, 0.6, 0.8, 1, 1.2, 1.5, 1.8),
limits = c(-1, 1.8)) +
scale_x_discrete(labels = function(x)str_replace(str_replace_all(x, "\\_", " "), " ", "\\\n")) +
labs(title = "Means and 95% CI by Arm \n and income Level",
x = 'Treatment',
y = "Post - Pre manipulation detection",
caption = "For participants who passed the attention check (N = 641); 95% CI displayed.") +
theme_minimal() +
theme(strip.text = element_text(size = 12),
axis.text = element_text(size = 7),
axis.title = element_text(size = 12, face = "bold"),
#legend.position = "none",
plot.title = element_text(size = 16, face = "bold", hjust = 0.5))The above approach shows similar results as using the continuous income variable. It needs to be noted that the results for higher income individuals suggest that the treatment effect may be effective for them as well, however, because of the lower sample size in these groups, the estimates are noisier. Further studies could focus on these groups and ensure sufficient sample size to be able to identify a potentially relatively lower effect size.
Additionally, because of the relatively low numbers of individuals in some of the higher income buckets, we repeated this analysis combining some of the groups. We combined the highest two and the middle two income groups for the graph below. This third approach also confirms the above finding that our treatment was the most effective for the lower income groups.
Income_level = c("Less than $25,000", "$25,000 to $49,999", "$50,000 to $99,999", "$100,000 or more")
HTE_data = HTE_data %>% mutate(income_level = as.factor(Income_numeric))
levels(HTE_data$income_level) = Income_level
means_accuracy_ <- HTE_data %>%
# select our variables of interest
dplyr::select(treated, income_level, mean_Delta_1_False) %>%
# group by treatment and income
group_by(treated, income_level) %>%
summarize(mean = mean(mean_Delta_1_False),
sd = sd(mean_Delta_1_False),
n=n(),
se = sd/sqrt(n))means_accuracy_ %>%
ggplot(aes(income_level, mean, group = treated, color = treated)) +
geom_errorbar(aes(ymin = mean - 1.96*se, ymax = mean + 1.96*se,
linetype = income_level),
width = .1,
position=position_dodge(width = .4), show.legend = TRUE) +
geom_point(aes(fill=treated), size =3, shape = c(21, 21, 22, 22, 23, 23, 24, 24),
position=position_dodge2(width= .4)) +
scale_shape_manual(name = "Missing values", values=c(21, 21, 22, 22, 23, 23, 24, 24)) +
scale_y_continuous(breaks = seq(-0.6, 1.2 , 0.2),
labels = c(-0.6, -0.4, -0.2, 0, 0.2, 0.4, 0.6, 0.8, 1, 1.2),
limits = c(-0.6, 1.1)) +
scale_x_discrete(labels = function(x)str_replace(str_replace_all(x, "\\_", " "), " ", "\\\n")) +
labs(title = "Means and 95% CI by Arm \n and income Level",
x = 'Treatment',
y = "Post - Pre manipulation detection",
caption = "For participants who passed the attention check (N = 641); 95% CI displayed.") +
theme_minimal() +
theme(strip.text = element_text(size = 12),
axis.text = element_text(size = 7),
axis.title = element_text(size = 12, face = "bold"),
#legend.position = "none",
plot.title = element_text(size = 16, face = "bold", hjust = 0.5))Future analyses should examine whether this heterogeneity by income is due to some fundamental difference by income or is being confounded by another variable, such as political knowledge, interest, or reasoning levels. Understanding more specifically where this HTE is coming from will be important for properly implementing the treatment in the field and designing new content to reach people for whom the present treatment was less effective (i.e., higher income individuals).
Causal Trees for HTE Analysis
Testing the three pre specified HTE we found that our treatment appears to generalize well and be equally effective for all, except for high income individuals. We now turn to using machine learning (ML) to further study HTEs. We expect that ML will help more effectively select multiple covariates to group people giving a better classification with very similar group members as compared to a single covariate division. Therefore, we expect that groups made using machine learning could show stronger HTEs. The ML subgroups can also be used to inform new covariates or heterogeneous policy assignments for future surveys. We also expect ML HTEs to reveal interesting correlations between covariates that can inform design decisions in the future.
library(causalTree)
library(grf)
library(rpart)
library(glmnet)
library(splines)
library(MASS)n <- nrow(HTE_data)
# Treatment: does the the gov't spend too much on "welfare" (1) or "assistance to the poor" (0)
HTE_data = HTE_data %>% mutate(treatment = as.numeric(treated))
treatment <- "treatment"
# Outcome: 1 for 'yes', 0 for 'no'
outcome <- "mean_Delta_1_False"
covariates <-
c(
'Gender_numeric',
'White',
'education_level_2',
'Income_numeric',
'Ideology_numeric',
'GeneralTrust_numeric',
'FauciTrust_numeric',
'SciTrust_numeric',
'PostFrequency_numeric',
'BlockUserSocialMedia_numeric',
'ReportUserSocialMedia_numeric',
'SeenManipulative_numeric',
'KnowSpot_numeric'
)
romano_wolf_correction <- function(t.orig, t.boot) {
abs.t.orig <- abs(t.orig)
abs.t.boot <- abs(t.boot)
abs.t.sorted <- sort(abs.t.orig, decreasing = TRUE)
max.order <- order(abs.t.orig, decreasing = TRUE)
rev.order <- order(max.order)
M <- nrow(t.boot)
S <- ncol(t.boot)
p.adj <- rep(0, S)
p.adj[1] <- mean(apply(abs.t.boot, 1, max) > abs.t.sorted[1])
for (s in seq(2, S)) {
cur.index <- max.order[s:S]
p.init <-
mean(apply(abs.t.boot[, cur.index, drop = FALSE], 1, max) > abs.t.sorted[s])
p.adj[s] <- max(p.init, p.adj[s - 1])
}
p.adj[rev.order]
}
summary_rw_lm <-
function(model,
indices = NULL,
cov.type = "HC2",
num.boot = 10000) {
if (is.null(indices)) {
indices <- 1:nrow(coef(summary(model)))
}
# Grab the original t values.
summary <- coef(summary(model))[indices, , drop = FALSE]
t.orig <- summary[, "t value"]
# Null resampling.
# This is a trick to speed up bootstrapping linear models.
# Here, we don't really need to re-fit linear regressions, which would be a bit slow.
# We know that beta hat ~ N(beta, Sigma), and we have an estimate Sigma hat.
# So we can approximate "null t-values" by
# - Draw beta.boot ~ N(0, Sigma-hat) --- note the 0 here, this is what makes it a *null* t-value.
# - Compute t.boot = beta.boot / sqrt(diag(Sigma.hat))
Sigma.hat <- vcovHC(model, type = cov.type)[indices, indices]
se.orig <- sqrt(diag(Sigma.hat))
num.coef <- length(se.orig)
beta.boot <-
mvrnorm(n = num.boot,
mu = rep(0, num.coef),
Sigma = Sigma.hat)
t.boot <- sweep(beta.boot, 2, se.orig, "/")
p.adj <- romano_wolf_correction(t.orig, t.boot)
result <- cbind(summary[, c(1, 2, 4), drop = F], p.adj)
colnames(result) <-
c('Estimate', 'Std. Error', 'Orig. p-value', 'Adj. p-value')
result
}fmla <- paste(outcome, " ~", paste(covariates, collapse = " + "))
# Dividing data into three subsets
indices <-
split(seq(nrow(HTE_data)), sort(seq(nrow(HTE_data)) %% 3))
names(indices) <- c('split', 'est', 'test')
# Fitting the forest
ct.unpruned <- honest.causalTree(
formula = fmla,
# Define the model
data = HTE_data[indices$split, ],
treatment = HTE_data[indices$split, treatment],
est_data = HTE_data[indices$est, ],
est_treatment = HTE_data[indices$est, treatment],
minsize = 1,
# Min. number of treatment and control cases in each leaf
HonestSampleSize = length(indices$est),
# Num obs used in estimation after splitting
# We recommend not changing the parameters below
split.Rule = "CT",
# Define the splitting option
cv.option = "TOT",
# Cross validation options
cp = 0,
# Complexity parameter
split.Honest = TRUE,
# Use honesty when splitting
cv.Honest = TRUE # Use honesty when performing cross-validation
)## [1] 2
## [1] "CT"
But we find that none of the HTE groups made in the 17 leaves show statistically significant HTE after p-value adjustment.
# Table of cross-validated values by tuning parameter.
ct.cptable <- as.data.frame(ct.unpruned$cptable)
# Obtain optimal complexity parameter to prune tree.
cp.selected <- which.min(ct.cptable$xerror)
cp.optimal <- ct.cptable[cp.selected, "CP"]
# Prune the tree at optimal complexity parameter.
ct.pruned <- ct.unpruned # prune(tree=ct.unpruned, cp=cp.optimal)
# Predict point estimates (on estimation sample)
tau.hat.est <- predict(ct.pruned, newdata = HTE_data[indices$est, ])
# Create a factor column 'leaf' indicating leaf assignment in the estimation set
num.leaves <- length(unique(tau.hat.est))
leaf <-
factor(tau.hat.est,
levels = sort(unique(tau.hat.est)),
labels = seq(num.leaves))
rpart.plot(
x = ct.pruned,
# Pruned tree
type = 3,
# Draw separate split labels for the left and right directions
fallen = TRUE,
# Position the leaf nodes at the bottom of the graph
leaf.round = 1,
# Rounding of the corners of the leaf node boxes
extra = 100,
# Display the percentage of observations in the node
branch = .1,
# Shape of the branch lines
box.palette = "RdBu"
) # Palette for coloring the node# This is only valid in randomized datasets.
fmla <- paste0(outcome, ' ~ ', paste0(treatment, '* leaf'))
if (num.leaves == 1) {
print("Skipping since there's a single leaf.")
} else if (num.leaves == 2) {
# if there are only two leaves, no need to correct for multiple hypotheses
ols <- lm(fmla, data = transform(HTE_data[indices$est, ], leaf = leaf))
coeftest(ols, vcov = vcovHC(ols, 'HC2'))[4, , drop = F]
} else {
# if there are three or more leaves, use Romano-Wolf test correction
ols <- lm(fmla, data = transform(HTE_data[indices$est, ], leaf = leaf))
interact <-
which(sapply(names(coef(ols)), function(x)
grepl(paste0(treatment, ":"), x)))
summary_rw_lm(ols, indices = interact, cov.type = 'HC')
}## Estimate Std. Error Orig. p-value Adj. p-value
## treatment:leaf2 0.687500 1.408218 0.62599846 0.6229
## treatment:leaf3 1.358333 1.277416 0.28905005 0.5346
## treatment:leaf4 1.400000 1.443976 0.33357412 0.5346
## treatment:leaf5 1.562500 1.346514 0.24742128 0.5346
## treatment:leaf6 1.613636 1.204603 0.18207786 0.4628
## treatment:leaf7 1.826923 1.196683 0.12860192 0.4094
## treatment:leaf8 2.005556 1.172576 0.08891782 0.3334
## treatment:leaf9 2.061404 1.287646 0.11115084 0.3706
## treatment:leaf10 2.062500 1.467330 0.16156239 0.4628
## treatment:leaf11 2.125000 1.213732 0.08168413 0.3334
## treatment:leaf12 2.233333 1.277416 0.08211325 0.3334
## treatment:leaf13 2.450000 1.295036 0.06011809 0.2777
## treatment:leaf14 2.533333 1.312420 0.05514434 0.2599
## treatment:leaf15 2.766667 1.363242 0.04388238 0.2218
## treatment:leaf16 2.884615 1.611712 0.07516980 0.3133
## treatment:leaf17 2.937500 1.321026 0.02741593 0.1562